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Abstract. This paper develops theoretical results regarding noisy 1-bit compressed sensing and 
sparse binomial regression. We demonstrate that a single convex program gives an accurate estimate 
of the signal, or coefficient vector, for both of these models. Wc show that an s-sparso signal in R" 
can be accurately estimated from m = 0(slog(n/s)) single-bit measurements using a simple convex 
program. This remains true even if almost half of the measurements are randomly flipped. Worst- 
case (adversarial) noise can also be accounted for, and uniform results that hold for all sparse inputs 
are derived as well. In the terminology of sparse logistic regression, we show that 0(s log(n/s)) 
Bernoulli trials arc sufficient to estimate a coefficient vector in R" which is approximately s-sparsc. 
Moreover, the same convex program works for virtually all generalized linear models, in which the 
link function may be unknown. To our knowledge, these are the first results that tie together the 
theory of sparse logistic regression to 1-bit compressed sensing. Our results apply to general signal 
structures aside from sparsity; one only needs to know the size of the set K where signals reside. 
The size is given by the mean width of K, a computable quantity whose square serves as a robust 
extension of the dimension. 



1. Introduction 

1.1. One-bit compressed sensing. In modern data analysis, a pervasive challenge is to recover 
extremely high-dimensional signals from seemingly inadequate amounts of data. Research in this 
direction is being conducted in several areas including compressed sensing sparse approximation 
and low-rank matrix recovery. The key is to take into account the signal structure, which in essence 
reduces the dimension of the signal space. In compressed sensing and sparse approximation, this 
structure is sparsity — we say that a vector in is s-sparse if it has s nonzero entries. In low-rank 
matrix recovery, one restricts to matrices with low-rank. 

The standard assumption in these fields is that one has access to linear measurements of the 
form 

(1.1) yi = {ai,x), i = 1,2, . . . ,m, 

where ai, a2, . . . , G R" are known measurement vectors and x £ R" is the signal to be recovered. 
Typical compressed sensing results state that when Oj are iid random vectors drawn from a certain 
distribution (e.g. Gaussian), m ~ slog(n/s) measurements suffice for robust recovery of s-sparse 
signals x, see [6]. 

In the recently introduced problem of 1-bit compressed sensing [3], the measurements are no 
longer linear but rather consist of single bits. If there is no noise, the measurements are modeled 

as 

(1.2) yi = sign{{ai,x)), i = l,2,...,m 
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where sign(x) = 1 if x > and sign(a;) = — 1 if x < 0. On top of this, noise may be introduced as 
random or adversarial bit flips. 

The 1-bit measurements are meant to model quantization in the extreme case. It is interesting 
to note that when the signal to noise ratio is low, numerical experiments demonstrate that such 
extreme quantization can be optimal [14]. The webpage http://dsp.rice.edu/lbitCS/ is ded- 
icated to the rapidly growing literature on 1-bit compressed sensing. Further discussion of this 
recent literature will be given in Section 3.1; we note for now that this paper presents the first 
theoretical accuracy guarantees in the noisy problem using a polynomial-time solver (given by a 
convex program). 

1.2. Noisy one-bit measurements. We propose the following general model for noisy 1-bit com- 
pressed sensing. We assume that the measurements, or response variables, yi G {—1, 1} are drawn 
independently at random satisfying 

(1.3) Eyi = e{{ai,x)), i = l,2,...,m 

where 9 is some function, which automatically must satisfy —'\-<0{z)<l. A key point in our 
results is that 6 may be unknown or unspecified; one only needs to know the measurements yi and 
the measurement vectors in order to recover x. 

In compressed sensing it is typical to choose the measurement vectors Oj at random, see [6]. In 
this paper, we choose to be independent standard Gaussian random vectors in R". Although 
this assumption can be relaxed to allow for correlated coordinates (see Section 3.4), discrete distri- 
butions are not permitted. Indeed, unlike traditional compressed sensing, accurate noiseless 1-bit 
compressed sensing is provably impossible for some discrete distributions of (e.g. for Bernoulli 
distribution, see [20]). Summarizing, the model (1.3) has two sources of randomness: 

(1) the measurement vectors are independent standard Gaussian random vectors; 

(2) given {aj}, the measurements yi are indepdenent { — 1, 1} valued random variables. 

Note that (1.3) is the generalized linear model in statistics, and 6 is known as the inverse of 
the link function; the particular choice 6{z) = tanh(z/2) coresponds to logistic regression. The 
statisticians may prefer to switch x with /3, with Xi, n with p and m with n, but we prefer to 
keep our notation which is standard in compressed sensing. 

Notice that in noiseless 1-bit compressed sensing model (1.2), all information about the magni- 
tude of X is lost in the measurements. Similarly, in the noisy model (1.3) the magnitude of x may 
be absorbed into the definition of 6. Thus, our goal will be to estimate the projection of x onto 
the Euclidean sphere, x/ WxW^. Without loss of generality, we thus assume that ||a;||2 = 1 in most 
of our discussion that follows. 

We shall make a single assumption on the function 9 defining the model (1.3), namely that 

(1.4) E9{g)g=:X>0 

where g is standard normal random variable. To see why this assumption is natural, notice that 
{ai,x) ~ A/'(0, 1) since are standard Gaussian random vectors and ||a;||2 = 1; thus 

Eyi{ai,x) = E9{g)g = A. 

Thus our assumption is simply that the 1-bit measurements yi are positively correlated with the 
corresponding linear measurements {ai,x)? The standard 1-bit compressed sensing (1.2) is a 
partial case of model (1.3) with 9{z) = sign(2:); in this case we have \ = E\g\ = a/Vtt- 

''^For concrctcncss, wo sot sign(O) — 1; this choico is arbitrary and could bo rcplacod with sign(O) — —1. 
^If E6{g)g < 0, we could replace yi with —yi to change the sign; thus our assumption is really that the correlation 
is non-zero: E6(g)g ^ 0. 
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1.3. The signal set. To describe the structure of possible signals, we assume that x lies in some 
set K C B2 where B2 denotes the Euclidean unit ball in R". A key characteristic of the size of the 
signal set is its mean width w{K), defined as 

(1.5) w{K):=E sup {g,x) 

xeK-K 

where fir is a standard normal Gaussian vector in IR" and K — K denotes the Minkowski difference.^ 
An intuitive explanation of the mean width, its basic properties and simple examples are given in 
Section 2. The important point is that w{KY can serve as the effective dimension of K. 

The main example of interest is where K encodes sparsity. If if = Kn^g is the convex hull of the 
unit s-sparse vectors in R'^, the mean width of this set computed in (2.2) and (3.3) is 

(1.6) w{Kn,s)-{s\og{n/s)Y/''. 

1.4. Main results. We propose the following solver to estimate the signal x from the 1-bit mea- 
surements yi- It is given by the optimization problem 

m 

(1.7) max^ yj(ai, a?') subject to x' ^ K. 

i=l 

This can be described even more compactly as 

(1.8) max(y, Aa;') subject to x' e K 

where A is the mx n measurement matrix with rows and y = (yi, . . . , ym) is the vector of 1-bit 
measurements. In words, the program finds the vector in K that maximizes the correlation of the 
data y with the (unknown) linear measurements Ax' . 

If the set K is convex, (1.7) is a convex program, and therefore it can be solved algorithmically 
efficiently. This is the situation we will mostly care about, although our results below apply for 
general, non-convex signal sets K as well. 

Theorem 1.1 (Estimating a fixed signal under random noise). Let ai, . . . ,am be independent 
standard Gaussian random vectors in R"^, and let K be a subset of the unit Euclidean ball in R". 
Fix X G K satisfying \\x\\2 = 1. Assume that the measurements yi, . . . , j/n follow the model above.^ 
Then for each j3 > Q, with probability at least 1 — 4exp(— 2/3^) the solution x to the optimization 
problem (1.7) satisfies 

\\x-x\\l < -^iwiK) + /3). 



As an immediate consequence, we see that the signal x & K can be effectively estimated from 
m = 0{w{K)'^) one-bit noisy measurements. The following result makes this statement precise. 

Corollary 1.2 (Number of measurements). Let S > and suppose that 

m > C5-^w{Kf. 

Then, under the assumptions of Theorem 1.1, with probability at least 1 — 8exp(— c5^m) the solution 
X to the optimization problem (1.7) satisfies 

\\x — x\\\ < S/X. 



"^Specifically, K — K = {x ~ y : x,y £ K}. 

^Specifically, our assumptions are that yi are {—1, 1} valued random variables that are independent given {oi}, 
and that (1.3) holds with some function 6 satisfying (1.4). 
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Here and in the rest of the paper, C and c denote positive absolute constants whose values may 
change from instance to instance. 

Theorem 1.1 is concerned with an arbitrary but fixed signal x & K, and with a stochastic model 
on the noise in the measurements. We will show how to strengthen these results to cover all signals 
X e K uniformly, and to allow for a worst-case (adversarial) noise. Such noise can be modeled as 
flipping some fixed percentage of arbitrarily chosen bits, and it can be measured using the Hamming 

distance = Yl'^i ^{y^y^y^} between y,y e {-1, l}"". 

Theorem 1.3 (Uniform estimation, adversarial noise). Let ai,...,am be independent standard 
Gaussian random vectors in R", and let K be a subset of the unit Euclidean ball in IR". Let S > 
and suppose that 

(1.9) m > CS-^w{Kf. 

Then with probability at least 1 — 8exp(— c5^m), the following event occurs. Consider a signal 
X £ K satisfying \\x\\2 = 1 and its (unknown) uncorrupted 1-bit measurements y = {yi, . . . ,ym) 
given as 

yi = sign{{ai,x)), i = l,2,...,m. 
Let y = (yi, . . . , y„i) € {—1, 1}'" be any (corrupted) measurements satisfying dn^y, y) < rm. Then 
the solution x to the optimization problem (1.7) with input y satisfies 

(1.10) ||£-a;||2 < ^Vlog(eM+llrV^og(e7r). 

This uniform result will follow from a deeper analysis than the fixed-signal result. Theorem 1.1. 
Its proof will be based on the recent results from [19] on random hyperplane tessellations of K. 

Remark 1.4 (Sparse estimation). A remarkable example is for s-sparse signals in R". Recalling the 
mean width estimate (1.6), we sec that our results above imply that an s-sparse signal in R" can 
be effectively estimated from m = 0{slog(n/s)) one-bit noisy measurements. We will make this 
statement precise in Corollary 3.1 and the remark after it. 

Remark 1.5 (Encoding and decoding: algorithmic embeddings into the Hamming cube). Let us put 

Theorem 1.3 in the context of coding in information theory. In the earlier paper [19] we proved 
that K n 5""^^ can be almost isomctrically embedded into the Hamming cube {—1, 1}"*, with the 
same m and same probability bound as in Theorem 1.3. Specifically, one has 



(1.11) 



-dG{x,x') - — d//(sign(Aa;),sign(Aa;')) 



< 5 for all x,x' eKnS'' 



where da and dn denote the geodesic distance in and the Hamming distance in {—1,1}"* 

respectively, see Theorem 6.3 below. Thus the embedding K fl 5""^ — )• {—1, 1}™ is given by the 
map^ 

x sign(Aa;). 

This map encodes a given signal x E K into a binary string y = sign(Aa;). Conversely, one can 
accurately and robustly decode x from y by solving the optimization problem (1.7). This is the 
content of Theorem 1.3. 

Remark 1.6 (Optimality) . While the dependence on the mean width vo{K) in the results above 
seems to be optimal (see [19] for a discussion), the dependence on the accuracy 5 is most likely not 
optimal. We are not trying to optimize dependence on 5 in this paper, but are leaving this as an 
open problem. 



'The sign function is applied to each coordinate of Ax. 
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Theorem 1.3 can be extended to allow for a random noise together with adversarial noise; this 
is discussed in Remark 4.4 below. 

1.5. Organization. An intuitive discusison of the mean width along with the estimate (1.6) of 
the mean width when K encodes sparse vectors is given in Section 2. In Section 3 we specialize 
our results to a variety of (approximately) sparse signal models — 1-bit compressed sensing, sparse 
logistic regression and low-rank matrix recovery. In Subsection 3.4, we extend our results to allow 
for correlations in the entries of the measurement vectors. 

The proofs of our main results, Theorems 1.1 and 1.3, are given in Sections 4 — 6. In Section 4 
we quickly reduce these results to the two concentration inequalities that hold uniformly over the 
set K — Propositions 4.2 and 4.3 respectively. Proposition 4.2 is proved in Section 5 using standard 
techniques of probability in Banach spaces. The proof of Proposition 4.3 is deeper; it is based on 
the recent work of the authors [19] on random hyperplane tessellations. The argument is given in 
Section 6. 

1.6. Notation. We write a ^ b ii ca < b < Ca for some positive absolute constants c, C (a and 
b may have dimensional dependence). In order to increase clarity, vectors are written in lower 
case bold italics (e.g., g), and matrices arc upper case bold italics (c.g, A). We let g denote a 
standard Gaussian random vector whose length will be clear from context; g denotes a standard 
normal random variable. C, c will denote positive absolute constants whose values may change from 
instance to instance. Given a vector v in IR" and a subset T C {1, . . . , n}, we denote by vt € K'^ 
the restriction of v onto the coordinates in T. 

i?2 and denote the unit Euclidean ball and sphere in R" respectively, and i?" denotes the 

unit ball with respect to £i norm. The Euclidean and £i norms of a vector v are denoted \\v\\2 and 
\\v\\^ respectively. The number of non-zero entries of v is denoted H^Hq. The operator norm (the 
largest singular value) of a matrix A is denoted 

2. Mean width and sparsity 

2.1. Mean width. In this section we explain the geometric meaning of the mean width of a set 

iiT C which was defined by the formula (1.5), and discuss its basic properties and examples. 

The notion of mean width plays a significant role in asymptotic convex geometry (see e.g. [7]). 
The width of K in the direction of 77 G S""^ is the smallest width of the slab between two parallel 
hyperplanes with normals rj that contains K. Analytically, the width can be expressed as 

sup(t7,'u)— inf(r7,'i;)= sup {r],x), 

ueK '"^K xeK-K 

see Figure 1. Averaging over r] uniformly distributed in S"^^, we obtain the spherical mean width: 

w{K) := E. sup {r],x). 

xeK-K 




Figure 1. Width of a set K in the direction of r] is illustrated by the dashed line. 
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Instead of averaging using rj E S'^ ^, it is often more convenient to use a standard Gaussian 
random vector gr g K". This gives the definition (1.5) of the Gaussian mean width of K: 

w{K):=E. sup {g,x). 

xeK-K 

In this paper we shall use the Gaussian mean width, which we call the "mean width" for brevity. 
Note that the spherical and Gaussian versions of mean width are proportional to each other. 
Indeed, by rotation invariance we can realize r/ as rj = g/\\g\\2 and note that rj is independent 
of the magnitude factor ||fir||2. It follows that w{K) = E ||gr||2 • w(K). Further, once can use that 
E II5II2 ~ ^/n and obtain the useful comparison of Gaussian and spherical versions of mean width: 

w{K) ~ -v/n • w{K). 

Let us record some further simple but useful properties of the mean width. 

Proposition 2.1 (Mean width). The mean width of a subset if C K" has the following properties. 

1. The mean width is invariant under orthogonal transformations and translations. 

2. The rn,ean width is invariant under taking the convex hull, i.e. w{K) = w{conv{K)). 

3. We have 

w{K) = E sup \{g,x)\. 

xeK-K 

4. Denoting the diameter of K in the Euclidean metric by diam(if), we have 

y^diam(if) < w{K) < n^/^ disim{K). 

5. We have 

w{K) < 2E sup(flr,£c) < 2E sup \ {g,x)\. 

xGK xeK 

For an origin- symmetric set K, both these inequalities become equalities. 

6. The inequalities in part 5 can be essentially reversed for arbitrary K: 

w{K) > Esup |(flf,a;)| - W - dist(0, if). 

xeK V TT 

Here dist(0, if) = inf-^gx \\x\\2 is the Euclidean distance from the origin to K. In particular, if 
G if then one has w{K) > Esup^.^^ |(gf,a;)|. 

Proof. Parts 1, 2 and 5 arc obvious by definition; part 3 follows by the symmetry of if — if . 
To prove part 4, note that for every ajo € if — if one has 

(2.1) w{K)>^{g,xo)\ = ^\\xo\\2. 

The equality here follows because {g,Xo) is a normal random variable with variance ||a;o||2. This 
yields the lower bound in part 4. For the upper bound, we can use part 3 along with the bound 
\{g, x)\ < 1151121135112 < Wolh ■ diam(if) for all a; G if — if . This gives 

w{K) < E ||ff||2 • diam(if) < (E \\g\\l)^/^ diam{K) = n^/^ diam(if). 

To prove part 6, let us start with the special case where G if . Then if — if D if U (—if), thus 

w{K) > Esup3,g^u(-_ft')(fl') ^) = 'Esupg.g^ |(gf, a;)| as claimed. Next, consider a general case. Fix 
xo & K and apply the previous reasoning for the set if — a^o 9 0. Using parts 1 and 3 we obtain 

w{K) = w{K - cco) > E sup |(£jr,a; - Xo)| > E sup \{g,x)\ - E|(£jr,xo)| . 

xeK xeK 
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Finally, as in (2.1) we note that E|(gr, £Co)| = y f ll^olh- Minimizing over xq E K completes the 
proof. □ 



Example. For illustration, let us evaluate the mean width of some sets K C . 

1. E K = B'!^' of K = S''-^ then w{K) = E ||g||2 < (E ||g||i)^/2 = ^/n (and in fact w{K) - ^/n). 

2. If the linear algebraic dimension dim(i^) = k then w{K) < \fk. 

3. If K is a finite set, then w{K) < Cy/log\K\. 

Remark 2.2 (Effective dimension). The square of the mean width, 'w{K)'^, may be interpreted as 
the effective dimension of a set i^T C It is always bounded by the linear algebraic dimension 
(see the example above), but it has the advantage of robustness — a small perturbation of K leads 
to a small change in w{K)^. 

In this light, the invariance of the mean width under taking the convex hull (Proposition 2.1, 
part 2) is especially useful in compressed sensing, where a usual tactic is to relax the non-convex 
program to a convex program. It is important that in the course of this relaxation, the "effective 
dimension" of the signal set K remains the same. 

Mean width of a given set K can be computed using several tools from probability in Banach 
spaces. These include Dudley's inequality, Sudakov minoration, the Gaussian concentration in- 
equality, Slepian's inequality and the sharp technique of majorizing measures and generic chaining 
[16, 22]. 

2.2. Sparse signal set. The quintessential signal structure considered in this paper is sparsity. 
Thus for given n G IM and < s < n, we consider the set 

Sn,s = {xeR'' : \\x\\q < s, \\x\\2 < 1}. 

In words, Sn,s consists of s-sparse (or sparser) vectors with length n whose Euclidean norm is 
bounded by 1. 

Although the linear algebraic dimension of Sn,s is n (as this set spans R^), the dimension of 
Sn,s^{x G K" : ||£c||o = s} as a manifold with boundary embedded in R" is s.^ It turns out that the 
"effective dimension" of Sn,s given by the square of its mean width is much closer to the manifold 
dimension s than to the linear algebraic dimension n: 

Lemma 2.3 (Mean width of the sparse signal set). We have 

(2.2) cslog(2n/s) < w'^{Sn,s) < Cslog{2n/s). 

Proof. Let us prove the upper bound. Without loss of generality we can assume that s G IM. By 
representing Sn,s as the union of (J^^ s-dimensional unit Euclidean balls we see that 

w{Sn,s) = IE™ax ||srr||2 • 
For each T, the Gaussian concentration inequality (see Theorem 5.2 below) yields 

P{||5t||2 > E II5TII2 + *} < exp(-tV2), t > 0. 
Next, E ||gT||2 ^ (E ||fl'T||2)''^''^ = V^- Thus the union bound gives 

P |max ||sfT||2 >Vs + t^< Q exp(-tV2), t > 0. 

^Thus Sn,e is the union of s -|- 1 manifolds with boundary each of whose dimension is bounded by s. 
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Note that (") < exp(slog(en/s)); integrating gives the desired upper bound in (2.2). 

The lower bound in (2.2) follows from Sudakov minoration (see Theorem 6.1) combined with 
finding a tight lower bound on the covering number of Sn,s ■ Since the lower bound will not be used 
in this paper, we leave the details to the interested reader. □ 



3. Applications to sparse signal models 

Our main results stated in the introduction are valid for general signal sets K. Now we specialize 
to the cases where K encodes sparsity. It would be ideal if we could take K = Sn,s, but this set 
would not be convex and thus the solver (1.7) would not be known to run in polynomial time. 
We instead take a convex relaxation of Sn,s, an effective tactic from the sparsity literature. Notice 
that if £C G Sn,s then ||a;||j < s/s by Cauchy-Schwarz inequality. This motivates us to consider the 
convex set 

Kn,s = {xeR'' : \\x\\2 < 1, \\x\\i <Vs} = B^n y/sB^. 
Kn,s is almost exactly the convex hull of Sn,s, as is shown in [20]: 

(3.1) conv(S'n,s) C Kn,s C 2conv(S'n,s)- 

K 

n^s can be though of a set of approximately sparse vectors. 

If the signal is known to be exactly or approximately sparse, i.e. x G Kn,s, we may estimate x 
by solving the convex program 

m 

(3.2) max yijaj, x') subject to ||ic'||^ < a/s and ||a;'||2 < 1. 

1=1 

This is just a restatement of the program (1.7) for the set -fCn,s- 

Theorems 1.1 and 1.3 are supposed to guarantee that x can indeed can be estimated by a solution 
to (3.2). But in order to apply these results, we need to know the mean width of K^^s- A good 
bound for it follows from (3.1) and Lemma 2.3, which give 

(3.3) w{Kn,s) < 2w{conY{Sn,s)) = 2w{Sn,s) < C^slog{2n/s). 
This yields the following version of Corollary 1.2. 

Corollary 3.1 (Estimating an approximately sparse signal). Let ai, . . . ,am be independent stan- 
dard Gaussian random vectors in R^, and fix x E Kn,s satisfying \\x\\2 = 1. Assume that the 
measurements yi, - ■ ■ ,yn follow the model from Section 1.3 7 Let 5 > and suppose that 

m > C(5~^slog(2n/s). 

Then, with probability at least 1 — 8exp(— c(5^m), the solution x to the convex program (3.2) satisfies 

\\x — x\\2 < S/X. 

Remark 3.2. In a similar way, one can also specialize the uniform result. Theorem 1.3, to the 
approximately sparse case. 



Specifically, our assumptions were that {yi} are independent random variables that are jointly independent of 
{oi}, and that (1.3) holds with some function 6 satisfying (1.4). 
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3.1. 1-bit compressed sensing. Corollary 3.1 can be easily specialized to various specific models 
of noise. Let us consider some of the interesting models, and compute the correlation coefficient 
A = E6{g)g in (1.4) for each of them. 

Noiseless 1-bit compressed sensing: In the classic noiseless model (1.2), the measure- 
ments are given as yi = sign((ai, a;)) and thus 9{z) = sign(z). Thus 

X = E\g\ = y/2/^. 

w ^ 1 1 2 

Therefore, with high probability we obtain ||x — xljg < S provided that the number of 
measurements is m > CS~^slog{2n/s). This is similar to the results available in [20]. 
Random bit flips: Assume that each measurement yi is only correct with probability p, thus 

2/i = CiSign((aj,aj)), i = l,2,...,m 

where are independent {—1,1} valued random variables with P{^j = 1} = p, which 
represent random bit flips. Then 9{z) = sign(z) • E^i = 2sign(z)(p — 1/2) and 

A = 2(p - 1/2) E l^l = 2^2fK{p - 1/2). 

Therefore, with high probability we obtain — a;||2 < ^ provided that the number of 
measurements is m > Cb~'^{j)— l/2)~^slog(2n/s). Thus we obtain a surprising conclusion: 

The. signal x can be estimated even if almost half of the measurements are cor- 
rupted. 

Somewhat surprisingly, the estimation of x is done by one simple convex program (3.2). 
Of course, if at least half of measurements are corrupted, recovery is impossible by any 
algorithm. 

Random noise before quantization: Assume that the measurements are given as 

yi = sign{{ai,x) + z = l,2, ...,m 

where Ui are iid random variables representing noise added before quantization. This situ- 
ation is typical in analog-to-digital converters. It is also the latent variable selection model 
from statistics. 

Assume for simplicity that have density f{x). Then 6{z) = 1 — 2P {vi < —z}, and the 
correlation coefficient A = E6{g)g can be evaluated using integration by parts, which gives 

X = E9'ig) = 2Efi-g)>0. 

A specific value of A is therefore not hard to estimate for concrete densities /. For instance, 
if Ui are normal random variables with mean zero and variance cr^, then 

X = eJ^ exp(-5V2cT2) = J , 2^-.y 
V Tra^^ y 7r((j^ + 1) 

,. ^ 1 1 2 

Therefore, with high probability we obtain \\x — x\\2 < S provided that the number of 
measurements is m > CS~'^{a^ + l)s log(2n/s). Thus we obtain an unexpected conclusion: 

The signal x can be estimated even when the noise level a eclipses the magnitude 
of the linear measurements. 

Indeed, the average magnitude of the linear measurements is E |(ai,a;)| = -^2/71", while the 
average noise level a can be much larger. 
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Let us put these results in a perspective of the existing hterature on 1-bit compressed sensing. 
The problem of 1-bit compressed sensing, as introduced by Boufounos and Baraniuk in [3], is 
the extreme version of quantized compressed sensing; it is particularly beneficial to consider 1-bit 
measurements in analog-to-digital conversion (see the webpage http://dsp.rice.edu/lbitCS/). 
Several numerical results are available, and there are a few recent theoretical results as well. 

Suppose that a; G R" is s-sparse. Gupta et al. [11] demonstrate that the support of x can 
tractably be recovered from either 1) 0(s log n) nonadaptive measurements assuming a constant 
dynamic range of x (i.e. the magnitude of all nonzero entries of x is assumed to lie between two 
constants), or 2) O(slogn) adaptive measurements. Jacques et al. [12] demonstrate that any 
consistent estimate of a; will be accurate provided that m > O(slogn). Here consistent means that 
the estimate x should have unit norm, be at least as sparse as x, and agree with the measurements, 
i.e. sign{{ai,x)) = sign((aj, a;)) for all i. These results of Jacques et al. [12] can be extended to 
handle adversarial bit flips. The difficulty in applying these results is that the first two conditions 
are nonconvex, and thus it is unknown whether there is a polynomial-time solver which is guaranteed 
to return a consistent solution. 

In a dual line or research, Gunturk et al. [9, 10] analyze sigma-delta quantization. The focus of 
their results is to achieve an excellent dependence of on the accuracy 6 while minimizing the number 
of bits per measurement. However the measurements j/, in sigma-delta quantization are not related 
to any linear measurements (unlike those in (1.2) and (1.3)) but are allowed to be constructed in a 
judicious fashion (e.g. iteratively) . Furthermore, in Gunturk et al. [9, 10] the number of bits per 
measurement depends on the dynamic range of the nonzero part of x. Similarly, the recent work 
of Ardestanizadch et al. [1] requires a finite number of bits per measurement. 

The noiseless 1-bit compressed sensing given by the model (1.2) was considered by the present 
authors in the earlier paper [20], where the following convex program was introduced: 

m 

min||a;'||^ subject to yi = sign{{ai,x')) and yijcii, x') = m. 

i=l 

This program was shown in [20] to accurately recover an s-sparse vector x from m = (9(slog(n/s)^) 
measurements yi. This result was the first to propose a polynomial-time solver for 1-bit compressed 
sensing with provable accuracy guarantees. However, it was unclear how to modify the above convex 
program to account for possible noise. 

The present paper proposes to overcome this difficulty by considering the convex program (3.2) 
(and in the most general case, the optimization problem (1.7)). One may note that the program 
(3.2) requires the knowledge of a bound on the (approximate) sparsity level s. In return, it does 
not need to be adjusted depending on the kind of noise or level of noise. 

3.2. Sparse logistic regression. In order to give concrete results accessible to the statistics 
community, we now specialize Corollary 1.2 to the logistic regression model. Further, we drop 
the assumption that ||a;||2 = 1 in this section; this will allow easier comparison with the related 
literature (see below). 

The simple logistic function is defined as 

(3.4) «-) = ^^- 

In the logistic regression model, the observations yi £ {—1,1} are iid random variables satisfying 

(3.5) P{yi = l} = fi{ai,x)), i = l,2,...,m. 

Note that this is a partial case of the generalized linear model (1.3) with 6{z) = tanh(z/2). We 
thus have the following specialization of Corollary 1.2. 
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Corollary 3.3 (Sparse logistic regression). Let ai, . . . , am be independent standard Gaussian ran- 
dom vectors in R"', and fix x satisfying x/ \\x\\2 & Kn,s- Assume that the observations yi, ■ ■ ■ ,yn 
follow the logistic regression model (3.5). Let S > and suppose that 

m > CS-^slog{2n/s). 

Then, with probability at least 1 — 8exp(— C(5^m), the solution x to the convex program (3.2) satisfies 

2 

f 

(3.6) 



X — J 77 

\\x\\ 



< (5max(||a;||2 ^ , 1). 

2 

Proof. We begin by reducing to the case when ||a;||2 = 1 by rescahng the logistic function. Thus, 



let a = ||a;||2 and define the scaled logistic function fa{x) = f{ax). In particular, 

X 

\x 



P{yi = l} = fa{{ai,j^)). 

To apply Corollary 3.1, it suffices to compute the correlation coefficient A in (1.4). First, by 
rescaling / we have also rescaled 9, so we consider 9{z) = tanh(a2;/2). We can now compute A 
using integration by parts: 

A = E0ig)g = EO'ig) = ^Esech\ag/2). 

To further bound this quantity below, we can use the fact that sech^(x) is an even and decreasing 
function for x > 0. This yields 

A > ^P{\ag/2\ < 1/2} • sech2(l/2) > . « . p {|^| < > ^ min(a, 1). 

The result follows from Corollary 3.1 since a = ||a;||2. □ 

Remark 3.4. Corollary 3.3 allows one to estimate the projection of x onto the unit sphere. One may 
ask whether the norm of x may be estimated as well. This depends on the assumptions made (see 
the literature described below). However, note that as ||£c||2 grows, the logistic regression model 
quickly approaches the noiseless 1-bit compressed sensing model, in which knowledge of ||£c||2 is 
lost in the measurements. Thus, since we do not assume that ||a;||2 is bounded, recovery of ||cc||2 
becomes impossible. 

For concreteness, we specialized to logistic regression. But as mentioned in the introduction, 
the model (1.3) can be interpreted as the generalized linear model, so our results can be readily 
used for various problems in sparse binomial regression. Some of the recent work in sparse binomial 
regression includes the papers [18, 4, 23, 2, 21, 17, 13]. Let us point to the most directly comparable 
results. 

In [2, 4, 13, 18] the authors propose to estimate the coefficient vector (which in our notation 
is x) by minimizing the negative log-likelihood plus an extra ii regularization term. Bunea [4] 
considers the logistic regression model. She derives an accuracy bound for the estimate (in the ii 
norm) under a certain condition stabil and a under bound on the magnitude of the entries of x. 
Similarly, Bach [2] and Kakade et al. [13] derive accuracy bounds (again in the ii norm) under 
restrictive eigenvalue conditions. The most directly comparable result is given by Negahban et al. 
[18]. There the authors show that if the measurement vectors have independent subgaussian 
entries, ||£c||o < s, and ||£c||2 < 1, then with high probability one has ||£ — a;||2 < 5, provided that 
the number of measurements is m > Cd~^slogn. Their results apply to the generalized linear 
model (1.3) under some assumptions on 9. 

One main novelty in this paper is that knowledge of the function 9, which defines the model 
family, is completely unnecessary when recovering the coefficient vector. Indeed, the optimization 
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problems (1.7) and (3.2) do not need to know 9. This stands in contrast to programs based on 
maximum likelihood estimation. This may be of interest in non-parametric statistical applications 
in which it is unclear which binary model to pick-the logistic model may be chosen somewhat 
arbitrarily. 

Another difference between our results and those above is in the conditions required. The above 
papers allow for more general design matrices than those in this paper, but this necessarily leads 
to strong assumptions on ||a;||2. As the inner products between {ai,x) grow large, the logistic 
regression model approaches the 1-bit compressed sensing model. However, as shown in [20], 
accurate 1-bit compressed sensing is impossible for discrete measurement ensembles (not only is 
it impossible to recovery x, it is also impossible to recover x/ ||a;||2). Thus the above results, all 
of which do allow for discrete measurement ensemsembles, necessitate rather strong conditions on 
the magnitude of {ai,x), or equivalently, on ||a;||2; these arc made explicitly in [4, 13, 18] and 
implicitly in [2]. In contrast, our theoretical bounds on the relative error only improve as the 
average magnitude of {ai,x) increases. 

3.3. Low-rank matrix recovery. We quickly mention that our model applies to single bit mea- 
surements of a low-rank matrix. Perhaps the closest practical application is quantum state tomog- 
raphy [8], but still, the requirement of Gaussian measurements is somewhat unrealistic. Thus, the 
purpose of this section is to give an intuition and a benchmark. The first author and Davenport 
are in the process of writing a paper that encompasses 1-bit quantum state tomography and 1-bit 
matrix completion. 

Let X G R"i^"2 ^ matrix of interest with rank r and Froobenius norm ||^||^ = 1. Consider 
that we have m single-bit measurements following the model in the introduction so that n = ni x ra2- 
Similarly to sparse vectors, the set of low-rank matrices is not convex, but has a natural convex 
relaxation as follows. Let 

ifm,n2,. = {X^ R"ix"2 : ||X||, < ^r, \\X\\^ < 1} 

where H-X"!!^ denotes the nuclear norm, i.e., the sum of the singular values of X. 

In order to apply Theorems 1.1 and 1.3, we only need to calculate w{Knj^^n2,r), as follows: 

^(-^ni,n2,r) = 2E SUp {G,X) 

where G is a matrix with standard normal entries and the inner product above is the standard 
entrywise inner product, i.e. {G,X) = Ylij ^i,jXij. Since the nuclear norm and operator norm 
are dual to each other, we have (G, X) < \\G\\ ■ H-X"!]^. Further, for each X G Kn^^n2,r, \\X\\^ < ■\/r, 
and thus 

w{Kni,n2,r) < VrE\\G\\ . 

The expected norm of a Gaussian matrix is well studied; one has E < ^/ni + ^/n2 (see e.g., 
[24, Theorem 5.32]). Thus, w{Kni,n2,r) < (\A^i + •\/^)aA'- follows that 0((ni +n2)r) noiseless 
1-bit measurements are sufficient to guarantee accurate recovery of rank-r matrices. We note that 
this matches the number of linear (infinite bit precision) measurements required in the low-rank 
matrix recovery literature (see [5]). 

3.4. Extension to measurements with correlated entries. In this section we present an exten- 
sion of our results to independent measurement vectors with correlated entries, namely ~ A/'(0, E) 

where E is a given covariance matrix. Let Ainin(S) and Ainax(E) denote the smallest and largest 
eigenvalues of S; the condition number of S is then k;(S) = Ainax(E)/Amin(E). It will be convenient 
to choose the normalization ||ll^/^a;||2 = 1; as before, this may be done by absorbing a constant 
into the definition of 9. 

12 



We propose the following generalization of the convex program (3.2): 

m 

(3.7) max yjjcii, x') subject to \\x'\\i < ■\/s/Amm(5^) and ||S"'^/^a;'||2 < 1. 

1=1 

The following result extends Corollary 3.1 to general covariance S. For simplicity we restrict 
ourselves to exactly sparse signals; however the proof below allows for a more general signal set. 

Corollary 3.5. Let ai,...,am be independent random vectors with distribution N{0,'E). Fix x 
satisfying \\x\\q < s and ||S^/^a;||2 = 1. Assume that the measurements yi, ■ ■ ■ ,ym follow the model 
from Section 1.3. Let 6 > and suppose that 

m > Ck(S) 6'^s\og{2n/s). 

Then with probability at least 1 — 8exp(— cJ^m), the solution x to the convex program (3.7) satisfies 

Amin(S) • 11* - x\\l < ||eV2je _ E^/^^Hl < S / X. 

Remark 3.6. Theorem 1.3 can be generalized in the same way — the number of measurements re- 
quired is scaled by k(S) and the error bound is scaled by Amin(S)~^. 

Proof of Corollary 3.5. The feasible set in (3.7) is K ■.= {x(^W : \\x\\i < ^/s/\~{T^, \\T}l'^x\\2 < 
1}. Note that the signal x considered in the statement of the corollary is feasible, since ||£c||i < 

11*112 V¥¥o < W^'^'^W ■ \\^^'M2V~s < y^TWs). 

Define dj := S^^/^aj; then dj are independent standard normal vectors and (aj, x) = (S^/^dj, x) = 
(di,S^/2a;). Thus, it follows from Corollary 1.2 applied with Ti^^'^x replacing x that if 

m > C5-M^^'''Kf 

then with probability at least 1 — 8 exp(— ctJ^m) 

||Si/2*-SV2a;||2< ^A. 

It remains to bound w{Y}l'^K). bmce 

Si/2/||si/2|| acts as a contraction, Slepian's inequality 
(see [16, Corollary 3.14]) gives w{T}I'^K) < ||SV2|| . w{K) = Aniax(S)^/2 • w{K). Further, K C 
Xnuni^)-'^/^ Kn,s- Thus, it follows from (3.3) that w{E^/^Kf < k(E)u;(K„,s) < Ck(S) s log(n/s). 
This completes the proof. □ 



4. Deducing Theorems 1.1 and 1.3 from concentration inequalities 

In this section we show how to deduce our main results. Theorems 1.1 and 1.3, from concentration 
inequalities. These inequalities are stated in Propositions 4.2 and 4.3 below, whose proofs are 
deferred to Sections 5 and 6 respectively. 

4.1. Proof of Theorem 1.1. Consider the rescaled objective function from the program (1.7): 

^ m 

(4.1) f^{x') = -^yi{ai,x'). 

1=1 

Here the subscript x indicates that / is a random function whose distribution depends on x through 
yi. Note that the solution x to the program (1.7) satisfies fx{x) > fx{x), since x is feasible. We 
claim that for any x' & K which is far away from x, the value fxix') is small with high probability. 
Thus X must be near to x. 

To begin to substantiate this claim, let us calculate Efx{x') for a fixed vector x' G K. 
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Lemma 4.1 (Expectation). Fix a;' G K". Then 



and thus 



Proof. We have 



nUx) - Ux')] = A(l - (a;, x')) >^\\x- x'\\l . 



Efx{x') = — ^Eyi{ai,x') = Eyi{ai,x'). 
^ i=i 

Now we condition on ai to give 

Eyi(ai,a;') = E E[yi(ai, a;')|ai] = Ee{{ai,x)){ai,x')). 

Note that {ai,x) and (ai, x') are a pair of normal random variables with covariance {x, x'). Thus, 
by taking g,h £ M{0, 1) to be independent, we may rewrite the above expectation as 



Ee(g){{x,x')z+ {\\x'\\l- {x,x'f)^^^h) = {x,x')Ee{g)g = X{x,x') 
where the last equality follows from (1.4). Lemma 4.1 is proved. □ 

Next we show that f{x') does not deviate far from its expectation uniformly for all x' G K. 
Proposition 4.2 (Concentration). For each t > 0, we have 

pj sup |/^(z)-E/^(z)| >4u;(K)/V^ + tj <4exp(-mtV8). 

\,zeK-K J 

This result is proved in Section 5 using standard techniques of probability in Banach spaces. 
Theorem 1.1 is a direct consequence of this proposition. 

Proof of Theorem 1.1. Let t > 0. By Proposition 4.2, the following event occurs with probability 
at least 1 — 4exp(— mt^/8): 



sup \f^{z)-Ef^{z)\<4w{K)/V^ + t. 

Suppose the above event indeed occurs. Let us apply this inequality for z = x — x£K — K. By 
definition of x, we have fx{x) > fx{x). Noting that the function fxiz) is linear in z, we obtain 
(4.2) 

< fx{x) - f{x) = fUx-x)< E[fx{x - x)] + Aw{K)/V^ + 1 < -- ||* - a;||^ + 4w{K)/V^ + 1. 

The last inequality follows from Lemma 4.1. Finally, we choose t = 4:13 /^/m and rearrange terms 
to complete the proof of Theorem 1.1. □ 

4.2. Proof of Theorem 1.3. The argument is similar to that of Theorem 1.1 given above. We 
consider the rescaled objective functions with corrupted and uncorrupted measurements: 

(4.3) fx{x') = — y]yi(ai,a3'), ^(a;') = — Vyi(ai,a3') = — Vsign((ai,a;))(ai,a;') 

1=1 1=1 1=1 

Arguing as in Lemma 4.1 (with 9{z) = sign(2:)), we have 

(4.4) EUix') = \{x,x') = E\g\- {x,x') = y^{x,x'). 

Similarly to the proof of Theorem 1.1, we now need to show that fx{x') does not deviate far from 
the expectation of fx{x'); but this time the result should hold uniformly over not only x' e K — K 
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but also X E K and y with small Hamming distance to y. This is the content of the following 

proposition. 



Proposition 4.3 (Uniform Concentration). Let 5 > and suppose that 
Then with probability at least 1 — 8 exp(— c(5^m), we have 



(4.5) 



sup 



Uz)-EU{z) <S^y\og{e/S)+AT^/\og{e/T) 



where the supremum is taken over x & K D , z £ K — K and y satisfying duiy, y) < Tm. 

This result is significantly deeper than Proposition 4.2. It is based on a recent geometric result 
from [19] on random tessellations of sets on the sphere. The proof is given is proved in Section 6. 
Theorem 1.3 now follows from the same steps as in the proof of Theorem 1.1 above. □ 

Remark 4.4 (Random noise). The adversarial bit flips allowed in Theorem 1.3 can be combined 
with random noise. We considered two models of random noise in Section 3.1. One was random hit 
flips where one would take = sign((ai, x)); here are iid Bernoulli random variables satisfying 
P {^i = 1} = p. The proof of Theorem 1.3 would remain unchanged under this model, aside from 
the calculation 

EU{x') = ^/2j^{2p-l). 

The end result is that the error bound in (1.10) would be divided by 2p — 1. 

Another model considered in Section 3.1 was random noise before quantization. Thus we let 

(4.6) yi = sigu{{ai,x) + Qi) 

where Qi ~ AA(0, o"^) are iid. Once again, a slight modification of the proof of Theorem 1.3 allows 
the incorporation of such noise. Note that the above model is equivalent to yi = sign((ai, x)) where 
ttj = {ai,gi) and x = {x, a) (wc have concatenated an extra entry onto and a;). Thus, by slightly 
adjusting the set K, we are back in our original model. 

5. Concentration: proof of Proposition 4.2 
Here we prove the concentration inequality given by Proposition 4.2. 

5.1. Tools: symmetrization and Gaussian concentration. The argument is based on the 
standard techniques of probability in Banach spaces — symmetrization and the Gaussian concentra- 
tion inequality. Let us recall both these tools. 

Lemma 5.1 (Symmetrization). Let ei,e2, ■ ■ ■ ,Sm be independent symmetric Bernoulli valued ran- 
dom variables.^ Then 



(5.1) 



E sup |/a=(2)-E/^(z)| <2E sup - 

zeK~K z&K-K m 



^eiyi{ai,z) 



Furthermore, we have the deviation inequality 
(5.2) 



P 



z&K-K 



zeK-K 



1 < 4P < 


sup 




zeK-K 



m 

Y. 



£iyi{ai,z) 



> t/2 



This means that P{ei = 1} = P{ei = —1} = 1/2 for each i. The random variables Si are assumed to be 
independent of each other and of any other random variables in question, namely a; and yi. 
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Inequality (5.1) follows e.g., from the proof of [16, Lemma 6.3]. The proof of inequality (5.2) is 
contained in [16, Chapter 6.1]. □ 



Theorem 5.2 (Gaussian concentration inequality). Let {Gx)xeT be a centered Gaussian process 
indexed by a finite set T. Then for every r > one has 

P i sup Ga, > E sup Ga, + r I < exp(-r^/2(7^) 
where = sup^g E < oo . 

A proof of this result can be found e.g. in [15, Theorem 7.1]. □ 
This theorem can be extended to separable sets T in metric spaces by an approximation argument. 
In particular, given a set K C B2, the standard Gaussian random vector g in satisfies 

(5.3) P i sup {g, x) > w{K) + r\ < exp{-r^/2), r > 0. 

[x&K-K J 



5.2. Proof of Proposition 4.2. We apply the first part (5.1) of Symmetrization Lemma 5.1. Note 

that since aj have symmetric distributions and yi G { — 1, 1}, the random vectors Siyiai has the 
same (iid) distribution as a^. Using the rotational invariance and the symmetry of the Gaussian 
distribution, we can represent the right hand side of (5.1) as 
(5.4) 



1 

sup — 
zeK-K m 



where '=* signifies the equality in distribution. So taking the expectation in (5.1) we obtain 

(5.5) E sup \f,{z)-EUz)\<^E sup (g,z) = ?^. 

zeK-K V"^ zeK-K v"^ 

To supplement this expectation bound with a deviation inequality, we use the second part (5.2) 
of Symmetrization Lemma 2.3 along with (5.5) and (5.4). This yields 



'^£iyi{ai,z) 



i=l 



dist 1 

= sup — 



dist 1 



1 



sup \{g,z) 



m zeK-K 



sup {g,z) 



m zeK-K 



P{ sup |/,(z)-E/,(z)|>^^+t!><4P 



1 



sup {g, z) >t/2\. 



'm zeK-K 

Now it remains to use the Gaussian concentration inequality (5.3) with r = t-y/m/2. The proof of 
Proposition 4.2 is complete. □ 



6. Concentration: proof of Proposition 4.3 

Here we prove the uniform concentration inequality given by Proposition 4.3. Beside standard 
tools in geometric functional analysis such as Sudakov minoration for covering numbers, our argu- 
ment is based on the recent work [19] on random hyperplane tessellations. Let us first recall the 
relevant tools. 

16 



6.1. Tools: covering numbers, almost isometries and random tessellations. Consider a 
set T C IR** and a number e > 0. Recall that an e-net ofT (in the Euclidean norm) is a set CT 
which has the following property: for every x e T there exists x & satisfying \\x — x\\2 < e. 
The covering number of T to precision e, which we call N{T,£), is the minimal cardinality of an 
e-nct of T. The covering numbers are closely related to the mean width, as shown by the following 
well-known inequality: 

Theorem 6.1 (Sudakov Minoration). Given a set T C R" and a number e > 0, one has 

log N{K,s) < logN{K - K,e) < Ce-^w{Kf. 

A proof of this theorem can be found e.g. in [16, Theorem 3.18]. □ 

Wc will also need two results from the recent work [19]. To state them conveniently, let A denote 
the m X n matrix with rows a^. Thus A is standard Gaussian matrix with iid standard normal 
entries. The first (simple) result guarantees that A acts as a almost isometric embedding from 
(K,|H|2) into (K-,|H|i). 

Lemma 6.2 ([19] Lemma 2.1). Consider a subset K C . Then for every n > one has 



P 



sup 

xeK-K 



m 



l^^lli ~ 



\x\ 



> — ^ + u ^ < 2exp - 



m 



The second (not simple) result demonstrates that the discrete map x >-)• sign(Aa;) acts as an 
almost isometric embedding from {K Ci S'"~^ , da) into the Hamming cube ({—1, 1}"^, dn)- Here do 
and dn denote the geodesic and Hamming metrics respectively; the sign function is applied to all 
entries of Ax, thus sign(Aa;) is the vector with entries sign((ai, a;)), i = 1, . . . ,m. 

Theorem 6.3 ([19] Theorem 1.5). Consider a subset K C B2 and let S > 0. Let 

m > C5-^w{Kf. 

Then with probability at least 1 — 2 exp(— c(5^m), the following holds for all x,x' & K f\ S'^~^ : 



-dG{x,x') dH {sign{Ax),sign{Ax')) 



TT 



m 



< 6. 



6.2. Proof of Proposition 4.3. Let us first assume that r = for simplicity; the general case 
will be discussed at the end of the proof. With this assumption, (4.3) becomes 

(6.1) fx{z) = fx{z) = —^sign{{ai,x)){ai,z). 

^ i=\ 

To be able to approximate a; by a net, we use Sudakov minoration (Theorem 6.1) and find a (5- net 

oi K r\ 5"^-"^ whose cardinality satisfies 

(6.2) \og\Ni,\<Cb~'^w{Kf. 

Lemma 6.4. Let 5 > and assume that m > C6''^w{K)^ . Then we have the following. 
1. (Bound on the net.) With probability at least 1 — 4exp(— cmJ^) we have 

(6.3) sup|/,„(^)-E/,„(z)|<5 

Xo,Z 

where the supremum is taken over all Xq G Ng and all z E K — K. 
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2. (Deviation of sign patterns.) For x,Xo E K Ci S'^ ^, consider the set 

T{x,Xo) := {i G H : sigii((ai, a;)) 7^ sign((ai, ajo))}. 
Then, with probability at least 1 — 2exp(— cm(5^) we have 
(6.4) sup |r(a;,£Co)| < 2m(5 



x,xo 

jn— 1 



where the supremum is taken over all x,xo G K D S'^ satisfying \\x — xo\\2 < S. 
3. (Deviation of sums.) Let s be a natural number. With probability at least 1—2 ex.p{—slog{em/s) / 2) 
we have 

(6.5) sup J] I (a,, z) I < s [ /8 ^M^ + 2Vlog(em/s)] . 

where the supremum is taken over all z E K — K and all subsets T C [n] with cardinality \T\ < s. 

Proof of Proposition 4-3. Let us apply Lemma 6.4, where in part 3 we take s = 2m6 rounded down 
to the next smallest integer. Then all of the events (6.3), (6.4), (6.5) hold simultaneously with 
probability at least 1 — 8exp(— crn(5^). 

Recall that our goal is to bound the deviation of fx{z) from its expectation uniformly over 
X & K r\ 5*"^^ and z e K — K. To this end, given x e K D S*"^^ we choose xq G Ns so that 
||cc — a;o||2 < S. By (6.1) and definition of the set T{x,Xo) in Lemma 6.4, we can approximate 
fx{z) by fxoiz) as follows: 

(6.6) \fx{z) - fxo{z)\ < ^ Yl 

j6T(a;,a;o) 

Furthermore, (6.4) guarantees that \T{x, xo)\ < 2mS. It then follows from (6.5), our choice s = 2mS 
and the assumption on m that 

J2 \{ai,z)\ <CmS ^/log{e/d). 

ieT{x,xo) 

Thus 

\fxiz) - fxoiz)\ <2C6Vl^^(^). 
Combining this with (6.3) we obtain 

(6.7) \fUz) - E/.„(z)| < CiJv^M^. 
Further, recall from (4.4) that E fx^iz) = ^2/7r {x, z) and thus 

(6.8) l^fxoiz) -^fx{z)\ = \{xo-x,z)\ < 2727^110:0 - xW^ < 2^21^ b. 

The last two inequalities in this line follow since z ^ K — K C 23^ and ||a;o — x\2 < b- Finally, we 
combine inequalities (6.7) and (6.8) to give 

\fx{z) - E/,(z)| < C2(5Vlog(e/5). 

Note that we can absorb the constant C2 into the requirement m > Cb~^w{K)'^ . This completes 
the proof in the case where r = 0. 

In the general case, we only need to tweak the above argument by increasing the size of s 
considered in (6.5). Specifically, it is enough to choose s to be rm + 2m(5 rounded down to the next 
smallest integer. This allows one to account for arbitrary rm bit flips of the numbers sign((ai, a;)), 
which produce the difference between f^ [z) and fx[z). The proof of Proposition 4.3 is complete. □ 

It remains to prove Lemma 6.4 that was used in the argument above. 
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Proof of Lemma 6.4- To prove part 1, we can use Proposition 4.2 combined with the union bound 
over the net A^^. Using the bound (6.2) on the cardinahty of A^^ we obtain 



P\sup\f^,{z)-Ef^,{z)\>4w{K)/^M + t\ < |A^6| 4exp(-mtV8) 

< 4cxp ( - mt'^/8 + CS-^w{Kf) 

where the supremum is taken over all ajo G Ns and z G K — K. It remains to chose t = 6/2 and 
recall that m > CS~^w{K)'^ to finish the proof. 

We now turn to part 2. First, note that |r(a;,a;o)| = (i//(sign(Aa;), sign(Aa;o)). Theorem 6.3 
demonstrates that this Hamming distance is almost isometric to the geodesic distance, which itself 
satisfies ^dGix,Xo) < \\x — Xo\\2 < S. Specifically, Theorem 6.3 yields that under our assumption 
that m > Cd~^w{K)'^, with probability at least 1 — 2exp(— c5^m) one has 

(6.9) |r(£c,£Co)| < 2m(5 

for all x,Xo e K n S'^~^ satisfying \\x — xqW^ < S. This proves part 2. 

In order to prove part 3, we may consider the subsets T satisfying |r| = s; there are (™) < 
exp(s log(em/s)) of them. Now we apply Lemma 6.2 for the T x n matrix PtA where Px denotes 
the coordinate restriction in onto R^; so in the statement of Lemma 6.2 we replace m by |r| = s. 
Combined with the union bound over all T, this gives 

,2^ 



r < sup - > \{ai, z)\ > \ - IpIIo H 1= \-u> < 2exp slog(em/s) 



su 



Recall that ||2||2 < 2 since z e K — K C 2^2 . Finally, we take = 41og(em/s) to complete the 
proof. □ 



7. Discussion 

Unlike traditional compressed sensing, which has already enjoyed an extraordinary wave of the- 
oretical results, 1-bit compressed sensing is in its early stages. In this paper, we proposed a 
polynomial-time solver (given by a convex program) for noisy 1-bit compressed sensing, and we 
gave theoretical guarantees on its performance. The discontinuity inherent in 1-bit measurements 
led to some unique mathematical challenges. We also demonstrated the connection to sparse bino- 
mial regression, and derived novel results for this problem as well. 

The problem setup in 1-bit compressed sensing (as first defined in [3]) is quite elegant, allowing 
for a theoretical approach. On the other hand, there are many compressed sensing results assuming 
substantially finer quantization. It would be of interest to build a bridge between the two regimes; 
for example, 2-bit compressed sensing would be an entirely different problem. 
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